Overview

Dataset Statistics

Number of Variables 11
Number of Rows 418
Missing Cells 414
Missing Cells (%) 9.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 145.4 KB
Average Row Size in Memory 356.2 B
Variable Types
  • Numerical: 3
  • Categorical: 8

Dataset Insights

PassengerId is uniformly distributed Uniform
Age has 86 (20.57%) missing values Missing
Cabin has 327 (78.23%) missing values Missing
Fare is skewed Skewed
Name has a high cardinality: 418 distinct values High Cardinality
Ticket has a high cardinality: 363 distinct values High Cardinality
Cabin has a high cardinality: 76 distinct values High Cardinality
Pclass has constant length 1 Constant Length
SibSp has constant length 1 Constant Length
Parch has constant length 1 Constant Length
Embarked has constant length 1 Constant Length
Name has all distinct values Unique
  • 1
  • 2

Variables


PassengerId

numerical

Approximate Distinct Count 418
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 6688
Mean 1100.5
Minimum 892
Maximum 1309
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PassengerId is uniformly distributed

Quantile Statistics

Minimum 892
5-th Percentile 912.85
Q1 996.25
Median 1100.5
Q3 1204.75
95-th Percentile 1288.15
Maximum 1309
Range 417
IQR 208.5

Descriptive Statistics

Mean 1100.5
Standard Deviation 120.8105
Variance 14595.1667
Sum 460009
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.1098
  • PassengerId is not normally distributed (p-value 1.3378476080523661e-24)

Pclass

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory Size 27588
  • The largest value (3) is over 2.04 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 3
2nd row 3
3rd row 2
4th row 3
5th row 3

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 418
  • The top 2 categories (3, 1) take over 50.0%
  • The largest value (3) is over 2.04 times larger than the second largest value (1)
  • Pclass has words of constant length

Name

categorical

Approximate Distinct Count 418
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 38658

Length

Mean 27.4833
Standard Deviation 9.9712
Median 25
Minimum 13
Maximum 63

Sample

1st row Kelly, Mr. James
2nd row Wilkes, Mrs. James...
3rd row Myles, Mr. Thomas ...
4th row Wirz, Mr. Albert
5th row Hirvonen, Mrs. Ale...

Letter

Count 9133
Lowercase Letter 7395
Space Separator 1309
Uppercase Letter 1738
Dash Punctuation 6
Decimal Number 0
  • The largest value (mr) is over 3.1 times larger than the second largest value (miss)

Sex

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Memory Size 29146
  • The largest value (male) is over 1.75 times larger than the second largest value (female)

Length

Mean 4.7273
Standard Deviation 0.9632
Median 4
Minimum 4
Maximum 6

Sample

1st row male
2nd row female
3rd row male
4th row male
5th row female

Letter

Count 1976
Lowercase Letter 1976
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (male, female) take over 50.0%
  • The largest value (male) is over 1.75 times larger than the second largest value (female)

Age

numerical

Approximate Distinct Count 79
Approximate Unique (%) 23.8%
Missing 86
Missing (%) 20.6%
Infinite 0
Infinite (%) 0.0%
Memory Size 5312
Mean 30.2726
Minimum 0.17
Maximum 76
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Age is skewed right (γ1 = 0.4553)

Quantile Statistics

Minimum 0.17
5-th Percentile 8
Q1 21
Median 27
Q3 39
95-th Percentile 57
Maximum 76
Range 75.83
IQR 18

Descriptive Statistics

Mean 30.2726
Standard Deviation 14.1812
Variance 201.1067
Sum 10050.5
Skewness 0.4553
Kurtosis 0.06451
Coefficient of Variation 0.4685
  • Age is not normally distributed (p-value 1.218624287281002e-05)
  • Age has 2 outliers

SibSp

categorical

Approximate Distinct Count 7
Approximate Unique (%) 1.7%
Missing 0
Missing (%) 0.0%
Memory Size 27588
  • The largest value (0) is over 2.57 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 1
3rd row 0
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 418
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 2.57 times larger than the second largest value (1)
  • SibSp has words of constant length

Parch

categorical

Approximate Distinct Count 8
Approximate Unique (%) 1.9%
Missing 0
Missing (%) 0.0%
Memory Size 27588
  • The largest value (0) is over 6.23 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 418
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 6.23 times larger than the second largest value (1)
  • Parch has words of constant length

Ticket

categorical

Approximate Distinct Count 363
Approximate Unique (%) 86.8%
Missing 0
Missing (%) 0.0%
Memory Size 30044

Length

Mean 6.8756
Standard Deviation 2.8214
Median 6
Minimum 3
Maximum 18

Sample

1st row 330911
2nd row 363272
3rd row 240276
4th row 315154
5th row 3101298

Letter

Count 353
Lowercase Letter 4
Space Separator 125
Uppercase Letter 349
Dash Punctuation 0
Decimal Number 2224

Fare

numerical

Approximate Distinct Count 169
Approximate Unique (%) 40.5%
Missing 1
Missing (%) 0.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 6672
Mean 35.6272
Minimum 0
Maximum 512.3292
Zeros 2
Zeros (%) 0.5%
Negatives 0
Negatives (%) 0.0%
  • Fare is skewed right (γ1 = 3.6739)

Quantile Statistics

Minimum 0
5-th Percentile 7.2292
Q1 7.8958
Median 14.4542
Q3 31.5
95-th Percentile 151.55
Maximum 512.3292
Range 512.3292
IQR 23.6042

Descriptive Statistics

Mean 35.6272
Standard Deviation 55.9076
Variance 3125.6571
Sum 14856.5376
Skewness 3.6739
Kurtosis 17.6931
Coefficient of Variation 1.5692
  • Fare is not normally distributed (p-value 7.276700983906272e-18)
  • Fare has 55 outliers

Cabin

categorical

Approximate Distinct Count 76
Approximate Unique (%) 83.5%
Missing 327
Missing (%) 78.2%
Memory Size 6286

Length

Mean 4.0769
Standard Deviation 2.8489
Median 3
Minimum 1
Maximum 15

Sample

1st row B45
2nd row E31
3rd row B57 B59 B63 B66
4th row B36
5th row A21

Letter

Count 118
Lowercase Letter 0
Space Separator 27
Uppercase Letter 118
Dash Punctuation 0
Decimal Number 226

Embarked

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.7%
Missing 0
Missing (%) 0.0%
Memory Size 27588
  • The largest value (S) is over 2.65 times larger than the second largest value (C)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row Q
2nd row S
3rd row Q
4th row S
5th row S

Letter

Count 418
Lowercase Letter 0
Space Separator 0
Uppercase Letter 418
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (S, C) take over 50.0%
  • The largest value (s) is over 2.65 times larger than the second largest value (c)
  • Embarked has words of constant length

Interactions

Correlations

Missing Values